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This  paper  begins  by  characterizing  the  major  problems  ffwing  today's 
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architectural  concepts.  Several  such  concepts  are  brought  to  bear  In 
the  later  sections  of  the  paper.  These  architectural  principles  have 
a major  Impact  upon  the  design  of  the  system  and  so  they  are  discussed 
in  some  detail.  A key  aspect  of  these  principles  is  that  they  can  be 
Implemented  with  near-term  technology.  The  rest  of  the  paper  is  de- 
voted to  the  functional  characteristics  and  the  theory  of  operation  of 
the  DBC.  The  theory  of  operation  is  based  on  a series  of  abstract  mod- 
els of  the  components  and  data  structures  employed  by  the  DBC.  These 
models  are  used  to  illustrate  how  the  DBC  performs  access  operations, 
manages  data  structures  and  security  specifications,  and  enforces 
security  requirements.  Short  ALGOL-like  algorithms  are  used  to  show 
how  these  operations  are  carried  out.  This  part  of  the  paper  concludes 
with  a high-level  description  of  the  DBC  hardware.  The  actual  details 
of  the  DBC  hardware  are  quite  Involved  and  so  their  presentation  is  the 
subject  of  Part  II  and  Part  III  Of  this  paper. 
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THE  ARCHITECTURE  OF  A DATABASE  COMPUTER 
PART  Ij  CONCEPTS  AND  CAPABILITIES* 

Richard  I.  Baum,  David  K.  Hsiao  and  Krlshnamurthl  Kannan** 

INTRODUCTION 

A hardware  architecture  for  a database  computer  (DBC)  Is  given  In  this 
paper.  The  proposed  design  overcomes  many  of  the  traditional  problems  of 
database  system  software  and  Is  one  of  the  first  to  describe  a complete 
data-secure  computer  capable  of  handling  large  databases. 

This  paper  begins  by  characterizing  the  major  problems  facing  today's 
database  system  designers.  These  problems  are  Intrinsically  related  to  the 
nature  of  conventional  hardware  and  can  only  be  solved  by  introducing  new 
architectural  concepts.  Several  such  concepts  are  brought  to  bear  In  the  later 
sections  of  the  paper.  These  architectural  principles  have  a major  impact 
upon  the  design  of  the  system  and  so  they  are  discussed  In  some  detail.  A 
key  aspect  of  these  principles  Is  that  they  can  be  implemented  with  near- 
term  technology.  The  rest  of  the  paper  is  devoted  to  the  functional  char- 
acteristics and  the  theory  of  operation  of  the  DBC.  The  theory  of  opera- 
tion Is  based  on  a series  of  abstract  models  of  the  components  and  data 
structures  employed  by  the  DBC.  These  models  are  used  to  Illustrate  how 
the  DBC  performs  access  operations,  manages  data  structures  and  security 
specifications,  and  enforces  security  requirements.  Short  ALGOL-like 
algorithms  are  used  to  show  how  these  operations  are  carried  out.  This 
part  of  the  paper  concludes  with  a high-level  description  of  the  DBC 
hardware.  The  actual  details  of  the  DBC  hardware  are  quite  involved  and 
so  their  presentation  is  the  subject  of  Part  II  and  Part  III  of  this  paper. 


*Thls  research  is  supported  by  the  Office  of  Naval  Research,  Contract 
N00014-75-C-0573  and  conducted  at  The  Ohio  State  University. 

**The  last  two  authors  are  with  the  Department  of  Computer  and  Information 
Science,  The  Ohio  State  University,  Columbus,  Ohio;  the  first  author  Is 
with  IBM,  Poughkeepsie,  N.T. 
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1.  PROBLEMS  AND  SOLUTIONS 

A number  of  major  problems  have  been  faced  by  database  designers  for 
a long  time.  These  problems  are  of  a very  general  nature  and  have  fre- 
quently plagued  builders  of  both  hardware  and  software  database  systems. 
This  section  of  the  paper  contains  a discussion  of  these  problems  and  of 
the  architectural  principles  adopted  In  the  DBC  design  which  solve  them. 

1.1  The  Problems  of  Database  System  Software 


A.  Name  Mapping  Complexity 

The  complexity  of  database  system  software  is  due,  in  large  part,  1 J 

to  the  requirement  for  name  mapping  operations.  Name  mapping  operations 
convert  symbolic  data  names,  called  a query,  into  memory  addresses 
which  identify  where  the  data  named  by  the  query  can  be  found.  Since 
the  language  which  is  used  to  name  data  Is  usually  far  more  powerful  than 
the  addressing  scheme  Implemented  by  the  hardware,  it  is  normally  nec- 
essary to  have  rather  Involved  name  mapping  algorithms.  Name  mapping 
algorithms  must  be  highly  optimised  if  they  are  to  perform  well.  In 
particular,  these  algorithms  must  minimize  their  secondary  storage 
access  requirements.  To  accomplish  this  most  name  mapping  algorithms 
use  very  complex  auxiliary  data  structures  to  guide  their  operation. 

To  illustrate  these  problems  consider  the  difficulties  of  the 
following  typical  name  mapping  scenerlo.  First,  the  query  is  used  to 
access  a "directory".  The  directory  contains  Information  which  allows 
the  algorithm  to  determine  the  approximate  location  of  the  requested 
data  (this  Information  thus  potentially  reduces  the  number  of  secondary 
storage  accesses  required  by  the  algorithm).  The  Information  retrieved 
from  the  directory  la  then  processed  In  some  manner  to  yield  secondary 
storage  addresses.  Finally,  the  secondary  storage  Is  accessed  and  the 
data  is  located.  This  software  name  mapping  algorithm  requires  auxiliary 
data  structures  in  both  the  directory  and  the  secondary  storage.  These 
auxiliary  data  structures,  which  Include  elements  such  as  pointers,  allow 
rapid  retrieval  of  data  from  the  secondary  storage  and  the  directory. 

All  of  these  auxiliary  data  structures  must  be  properly  maintained. 

This  last  requirement  is  the  underlying  cause  for  the  great  difficulty 
most  contemporary  syatems  have  In  executing  update  operations.  Update 
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operations  make  changes  to  auxiliary  structures  and  so  they  are  frequently 
very  time  consuming.  A classic  example  Is  the  process  of  modifying  a 
network  of  pointers. 

B.  Performance  Bottlenecks 

Database  system  software  normally  consists  of  several  distinct 
functional  parts  which  perform  specific  tasks.  For  example,  separate 
software  modules  which  perform  query  parsing,  directory  access,  directory 
processing,  data  retrieval  and  update,  and  data  security  are  usually 
found  in  contemporary  database  management  systems.  To  have  a well-balanced 
system  with  high  throughput  it  is  necessary  for  these  modules 
to  have  diverse  performance  capabilities.  Such  diverse  capabilities  are 
difficult  to  achieve  when  these  software  modules  are  usually  implemented 
with  the  same  underlying  hardware.  When  such  performance  capabilities 
cannot  be  met  because  of  inherent  hardware  constraints,  the  system 
develops  bottlenecks  and  its  performance  is  consequently  degraded.  Con- 
temporary database  management  systems  are  usually  plagued  by  many  such 
bottlenecks. 

C.  Data  Security  Overhead 

Powerful  data  security  facilities  are  generally  a performance  hinderance 
on  contemporary  systems.  The  most  powerful  data  security  mechanisms  allow 
security  specifications  to  be  written  in  the  query  language  of  the  system. 

To  authenticate  access  operations  it  is  therefore  necessary  to  perform 
multiple  name-mapping  operations — one  for  determining  the  requested  data 
and  several  for  determining  the  data  being  effected  by  the  security 
specifications.  The  use  of  name  mapping  algorithms  to  carry  out  security 
enforcement  is  generally  too  much  of  a performance  burden  to  be  seriously 
considered  in  present  systems. 

D.  Add-on  Approach  to  Security 

Security  capabilities  are  frequently  just  an  "add-on"  to  present 
systems.  This  kind  of  design  philosophy  opens  the  way  to  not  only  per- 
formance difficulties  but  also  to  questionable  reliability.  With  the  high 
degree  of  complexity  of  current  systems  it  is  extremely  difficult  to  add 
on  a security  mechanism  which  will  guarantee  that  all  "backdoors"  are, 
in  fact,  closed. 


s 

1.2  The  Problems  of  Building  DBC  Hardware 

A.  The  Need  for  Distant  Technology. 

Attempts  to  build  database  computer  hardware  rave  been  made  before  [1,2, 
3,4,5].  These  efforts  have  been  plagued  by  a number  of  critical  draw- 
backs. The  most  serious  shortcoming  in  these  systems  has  been  their 
reliance  on  monolithic  fully  associative  memories.  Such  memories  are 

q 

not  feasible  for  supporting  a large  on-line  database  (i.e.,  at  least  10 
bytes) , 


B.  Incomplete  Hardware  Designs 

Many  DBC  attempts  [1,6,7,8,9,10]  have  led  to  machines  that  could  not 
perform  all  of  the  functions  necessary  to  support  a viable  database 
management  system.  In  particular,  some  of  them  can  support  just  one  data- 
base management  function  in  hardware  such  as  directory  processing  or  data 
retrieval;  others  cannot  support  a critical  function,  such  as  update,  well. 
Previous  DBC  approaches  have  almost  always  lacked  a data  security  capa- 
bility — such  an  omission  makes  the  use  of  the  computer  in  a data  sharing 
environment  very  questionable  Indeed.  A viable  DBC  must  support  all 
database  management  functions  equally  well. 

1.3  Problem  Solving  Concepts 

To  overcome  the  problems  described  above  a number  of  key  design 
concepts  were  used  in  the  DBC.  These  design  concepts  Include  both 
architectural  principles  and  design  philosophy. 

A.  Partitioned  Content  Addressable  Memories 

The  use  of  hardware  content  addressing  can  significantly  reduce  the 
need  for  name-mapping  data  structures.  Content  addressable  memories 
eliminate  the  need  for  knowing  the  actual  location  of  a data  item.  In 
such  a memory  the  notion  of  "actual  location"  is  nonexistent;  instead, 
all  data  is  accessed  by  specifying  its  attributes.  This  kind  of  access 
gives  us  a very  Important  capability:  data  items  may  be  moved  about  with- 

out any  need  to  modify  name-mapping  data  structures.  This  is  because 
few,  if  any,  name-mapplng  data  structures  are  needed  in  a content  addressable 
memory.  This  characteristic  greatly  facilitates  update  operations. 
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A fully  associative  memory  large  enough  to  hold  a complete  database 
la  not  feasible.  However,  a storage  system  consisting  of  many  blocks 
(called,  partitions)  of  memory  each  of  which  is  content  addressable  is 
quite  feasible.  We  call  this  memory  concept  a partitioned  content  address- 
able memory  (PCAM) . It  is  possible  to  build  PCAMs  of  widely  varying 
performance  characteristics.  In  particular,  it  is  possible  to  design  the 
access  speed  and  capacity  of  a PCAM  to  meet  a particular  performance  re- 
quirement. This  flexibility  allows  us  to  design  three  PCAMs  for  use  in 
the  proposed  DBC  architecture  with  very  different  speeds  and  capacities. 

As  will  be  seen  later,  a PCAM  of  gigabyte  capacity  is  feasible  with  current 
technology . 


B.  Structure  and  Mass  Memories 

Since  PCAMs  are  block-oriented,  it  is  necessary  to  have  some  name- 
mapping data  structures  in  the  system.  Our  goal,  of  course,  is  to  minimize 
their  use  as  much  as  possible.  This  leads  to  the  architectural  concept 
of  the  structure  memory.  A DBC  employing  this  concept  has  two  memories. 

The  mass  memory  contains  the  information  making  up  the  database  and  is  by 
far  the  larger  of  the  two  memories.  The  mass  memory  contains  only  update 
invariant  name-mapping  data  structures.  Once  an  update  Invariant  data 
structure  is  created  for  a data  item  it  need  never  be  modified  so  long  as 
that  data  item  continues  to  exist  anywhere  in  the  database.  The  data 
structures  in  conventional  mass  storage  are  not  update  invariant;  they  must 
be  modified  whenever  the  location  of  a data  item  changes.  The  structure 
memory  contains  all  of  the  non-update  invariant  name  mapping  information 
necessary  to  locate  data  in  the  mass  memory.  To  access  the  database  the 
system  first  accesses  the  structure  memory,  obtains  mapping  information, 
processes  it  and  then  accesses  the  mass  memory. 

The  proposed  DBC  employs  the  structure  memory  concept.  Both  the  mass 
memory  and  the  structure  memory  are  PCAMs.  They,  of  course,  have  very 
different  functional  characteristics. 


C . Area  Pointers 

To  simplify  the  name-mapping  data  structures  that  are  still  required 
by  the  DBC,  a concept  called  the  area  pointer  is  used.  An  area  pointer 
indicates  the  PCAM  partition  in  which  a data  item  may  be  found  by 
employing  content  addressing.  Unlike  the  location  pointers  used  in  con- 
temporary systems,  area  pointers  need  not  be  modified  when  data  items  are 
moved  around  within  a partition. 


Conventional  mass  memories  do  not  support  the  area  pointer  concept. 
Our  mass  memory,  on  the  other  hand.  Is  a PCAM  and  so  area  pointer  support 
comes  naturally.  Area  pointers  are  stored  in  and  managed  by  the  structure 
memory. 


D.  Functional  Specialization 

The  DBC  contains  a number  of  components  with  considerably  different 
processing  speed  and  memory  capacity  requirements.  The  mass  storage  and 
structure  memory  are  examples  of  two  such  components.  To  keep  any  component 
from  becoming  a bottleneck  we  employ  the  architectural  concept  of  functional 
specialization^ . In  a functionally  specialized  system,  the  components  are 
individually  designed  to  be  optimally  adapted  to  their  function.  The  pro- 
cessing power  and  memory  capacity  of  each  component  is  determined  by  its 
role  in  the  system.  Because  all  major  components  are  specialized  (l.e., 
functionally  separate  from  other  components),  estimation  of  their  required 
processing  power  and  memory  capacity  is  much  easier.  In  the  proposed  DBC 
each  of  the  major  components  is  a physically  separate  hardware  component. 
This  approach  allows  us  to  build  a relatively  well-balanced  system  and  to 
avoid  bottlenecks  by  providing  each  component  with  the  right  amount  of 
processing  power  and  memory  capacity; 5 

The  proposed  DBC  has  seven  major  functionally  specialized  components; 
the  keyword  transformation  unit  (KXU) , the  structure  memory  (SM) , the  mass 
memory  (MM) , the  structure  memory  Information  processor  (SMIP) , the  index 
translation  unit  (1X0) , the  database  command  and  control  processor  (DBCCP) , 
and  the  security  filter  processor  (SFP) . These  seven  components  are  the 
heart  of  a database  computer  that  is  able  to  support  gigabyte  database  ca- 
pacities while  providing  full  retrieval,  update  and  security  capabilities. 


E.  Look-Aside  Buffering 

When  an  update  operation  occurs  it  is  sometimes  necessary  to  modify 
name  mapping  data  structures.  To  Insure  the  correct  execution  of  the 
queries  which  follow  the  update,  the  execution  of  queries  is  normally  post 
poned  until  the  update  operation  and  all  of  Its  related  changes  to  data 
structures  are  complete.  This  is  because  the  data  Involved  in  an  update 


.s  term  was  suggested  to  us  by  E.  Feustel 
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operation  could  very  well  be  the  data  the  next  operation  depends  on.  Thus 
update  operations  can  become  bottlenecks  in  contemporary  systems. 

In  our  DBC  changes  to  name  mapping  data  structures  induced  by  an  update 
operation  will  be  much  fewer  in  number  and  much  easier  than  in  contemporary 
systems;  the  changes,  nevertheless,  will  require  some  time.  To  reduce  the 
wait-time,  a look-aside  buffer  is  used  to  store  update  commands  temporarily. 
This  buffer  allows  the  results  of  updates  to  be  immediately  available  to 
the  rest  of  the  system  before  they  are  permanently  recorded.  These  changes 
can  be  stored  in  the  look-aside  buffer  in  much  less  time  than  it  would 
take  to  permanently  record  them  in  the  system.  In  this  way,  queries  fol- 
lowing an  update  operation  do  not  have  to  wait  for  the  permanent  effects  of 
that  update  operation  to  be  actually  stored  before  they  are  executed. 

F.  An  Integral  Data  Security  Mechanism 

At  the  outset  the  security  mechanism  was  made  an  integral  part  of  the 
DBC  design.  This  design  philosophy  not  only  allows  us  to  construct  a system 
that  has  no  "backdoors"  but  also  Insured  that  all  access  requests  are,  in 
fact,  controlled  by  the  DBC's  security  mechanism.  We  achieved  this  by 
designing  the  security  mechanism  first  and  by  then  designing  the  rest  of 
the  system  around  it.  The  DBC  supports  a security  specification  language 
that  is  the  same  as  the  DBC's  query  language. 

Security  in  the  DBC  is  provided  in  terms  of  two  distinct  protection  mech- 
anisms. The  first  mechanism  based  the  security  atom  concept  [14]  requries 
some  form  of  cooperation  from  the  creator  of  the  database.  This  mechanism 
achieves  enforcement  in  a rapid  and  elegant  manner  and  is  incorporated  in  the 
DBCCP.  The  second  enforcement  mechanism  allows  the  creator  wide  latitude  in 
the  manner  in  which  he  can  specify  security  related  Information.  Since  it 
generally  requires  more  (and  different)  processing  than  the  first,  the  second 
mechanism  is  incorporated  in  a functionally  specialized  component,  the  security 
filter  processor  (SFP) . Such  an  architecture  tends  to  lead  to  good  performance 
while  ensuring  that  security  is  not  compromised. 


G.  Performance  Enhancement  by  Clustering  Techniques 

A powerful  clustering  technique  has  been  Incorporated  in  the  DBC,  which 
allows  the  creator  of  the  database  to  optimize  access  times.  The  placement 
of  every  record  into  the  DBC  can  be  controlled  (in  terms  of  its  properties) 
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by  the  creator  of  the  database  in  such  a way  that  retrieval  of  records 
with  similar  properties  may  be  accomplished  with  minimal  access  delays. 


H . Advanced  Technology 

A database  computer  for  the  near  future  should  take  maximum  advantage 
of  the  technology  that  is  likely  to  be  available  then.  This  design  phil- 
osophy is  especially  important  In  an  era  of  rapidly  developing  technology 
such  as  the  present  one.  The  significant  developments  expected  in  the  area 
of  high  speed  bulk  storage  (semiconductors:  CCDs  and  dense  RAMs,  magnetic 

bubbles  and  electron  beam  memories)  and  low  cost  processing  power  (micro- 
processors) dictate  a major  rethinking  of  conventional  machine  archi- 
tectures. 

For  example,  an  all-electronic  storage  component  may  replace  the  fixed 
head  disk  as  the  fastest  bulk  storage  device'ln  the  system.  Since  these 
all-electronic  fixed  head  disk  replacements  will  offer  at  least  an  order  of 
magnitude  improvement  in  access  time,  they  will  allow  powerful  data  organi- 
zations that  were  previously  not  feasible  to  become  practical  as  well  as 
allowing  a significant  increase  in  the  throughput  of  certain  database 
system  components.  Low-cost  random  access  memory  will  allow  the  widespread 
use  of  very  large  data  buffers  and  Independent  functionally  specialized 
memories  throughout  the  system.  Low-cost  microprocessors  coupled  with  low- 
cost  bulk  memory  will  allow  parallel  processing  techniques  to  be  used  to 
construct  memories  with  powerful  search  capabilities. 
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2.  THE  FUNCTIONAL  CHARACTERISTICS  OF  THE  DATABASE  COMPUTER 

The  database  computer  must  communicate  with  external  systems  and  so 
a DBC  Interface  must  be  defined.  The  functional  characteristics  of  the 
DBC  provide  such  an  Interface.  The  DBC  functional  characteristics  define 
the  data  management  and  security  features  supported  by  the  DBC  and  show 
how  commands  are  sent  to  and  executed  by  It. 

2.1  A Back-end  Machine 

The  DBC  is  not  a general-purpose  computer  and  does  not  have  a typical 
operating  system.  Instead,  It  is  a separate  machine  dedicated  to  data- 
base operations.  Other  computers  and  systems  communicate  with  the  DBC 
by  using  DBC  access  conmands  and  by  sending  or  receiving  database  information. 
The  decisiou  to  design  the  DBC  as  a back-end  machine  to  support  database 
operations  in  a general-purpose  computer  system  is  a result  of  applying  the 
concept  of  functional  specialization.  A number  of  advantages  accrue  from 
this  decision  [11].  First,  the  DBC  is  not  constrained  to  be  used  with 
a particular  kind  of  general-purpose  computer  system.  Second,  more  than 
one  system  can  share  a DBC.  In  this  way,  the  back-end  DBC  can  serve  many 
front-end  computer  systems.  Third,  several  DBCs  can  become  part  of  a general- 
purpose  computer  system  to  facilitate  distributed  database  applications. 

This  interconnection  could  be  done  with  a geographically  wide- 
spread communications  network.  Finally,  all  DBC  access  channels  can  be 
identified  and  controlled.  This  is  necessary  to  insure  that  no  "backdoors" 
into  or  out  of  the  DBC  exist. 

We  shall  collectively  call  all  of  the  systems  which  communicate  with 
the  DBC  the  program  execution  system  (PES) . We  aggregate  all  these  systems 
into  one  conceptual  entity  so  that  it  will  be  easier  to  describe  the  opera- 
tion of  the  DBC. 

2.2  The  Functional  Model 
• ' 

The  DBC  proposed  here  Implements  the  attribute-based  model.  This 
model  has  been  extensively  studied  and  is  particularly  well-suited  to 
supporting  contemporary  database  functions  [12,13,14]. 
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A.  Queries  — The  Symbolic  Date  Names  Used  by  the  DBC 

Our  definition  of  a database  starts  vlth  two  terms:  a set  AT  of 
"attributes"  and  a set  VA  of  "vsluea".  These  are  left  undefined  to 
allow  the  broadest  possible  Interpretation.  We  shall  denote  a member 
of  AT  by  at  and  a member  of  VA  by  v .. 

A record  R is  a subset  of  the  cartesian  product  AT  x VA.  To  simplify 
the  notation  we  will  assume  without  loss  of  generality  that  in  a record 
all  attributes  are  distinct.  Thus,  R is  a set  of  ordered  pairs  of  the 
form: 


(an  attribute,  a value) 


Records  are  physically  stored  In  the  mass  memory.  The  set  of  all 
records  in  the  mass  niemory  is  called  a database  (DB) . The  database  may 
be  partitioned  into  subsets  called  files.  To  distinguish  among  several 
files,  each  file  is  given  a unique  name  F,  called  its  file  name. 

The  keywords  of  a record  are  those  attribute-value  pairs  which 
characterize  the  record.  In  practice  it  is  useful  to  consider  only 
succinct  keywords.  We  shall  denote  a keyword  by  the  notation  K. 

A keyword  predicate  T(K)  is  true  for  a keyword  K if  K satisfies  the 
condition  specified  by  T.  The  most  commonly  used  keyword  predicate  is 
the  equality  predicate  E(K)  which  is  true  for  R when  K is  the  same  as  a 
certain  keyword,  say,  K'.  For  this  special  case,  we  shall  denote  the  key- 
word predicate  by  simply  K' . Another  common  keyword  predicate  is  the  less- 
than  predicate  LTftt(K).  This  predicate  is  true  for  K when  the  attribute  of 
K is  at  and  the  value  of  K is  less  than  soma  value,  say,  v . This  keyword 
predicate  shall  be  denoted  by  (at  < v).  This  predicate  can  be  easily 
generalized  to  handle  other  relational  operators.  All  queries  are  made  up 
of  Boolean  expressions  of  keyword  predicates.  Keyword  predicates  allow 
queries  to  specify  Just  about  any  conceivable  keyword  property. 

A keyword  predicate  is  true  for  a record  R if  some  keyword  K in  R 
satisfies  the  keyword  predicate.  A query  is  a proposition  given  by  a 
Boolean  expression  of  keyword  predicates.  A query  is  true  for  R if  this 
proposition  holds  for  the  keywords  in  R;  such  a record  is  said  to  satisfy 
the  query.  The  set  of  all  records  in  DB  (or  in  a file  of  DB)  that 
satisfy  a query  Q will  be  called  its  response  set  and  denoted  by  Q(PB) 
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(or  Q(F)).  Every  query  is  written  In  disjunctive  normal  form. 

Q1  V Q2  V . . . V Qk 

where  each  conjunct  Q*  of  the  query  has  the  form: 

t}  A T«  A . . .A  T1 
i 4 n 

where  t|  are  keyword  predicates.  Some  examples  of  queries  follow.  The 
query  A Kj  true  for  R when  and  Kj  are  both  in  R.  The  query 
A (Salary  < 10,000)  is  true  for  R when  is  in  R and  there  is  a key- 
word in  R whose  attribute  is  Salary  and  whose  value  is  less  than  10,000. 
More  elaborate  querlea  can  be  formed  if  they  are  in  disjunctive  normal  form, 

B,  Security  Specifications  — The  Protection  of  Data 

A database  access  or  simply  an  access  is  the  name  of  a DBC  operation 
which  transfers  information  to  or  extracts  Information  from  DB.  Examples 
of  accesses  are  retrieve.  Insert  and  delete.  Let  ACC  denote  the  set  of 
the  names  of  all  the  accesses  available  in  DBC.  Let  a member  of  ACC  be 
represented  by  a and  a subset  of  ACC  by  A. 

A security  specification  is  a relation 

Si  DB  -»•  2^^  where  2A<^  Is  the  power  set  of  ACC. 

Thus,  for  a record  R in  DB,  the  security  specification,  S(R)  - A,  indicates 
which  subset  A of  accesses  is  permitted  on  R. 

A file  sanction  or  simply  a sanction  is  defined  as  the  couple  (Q,A) 
where  Q is  a query,  and  A is  a subset  of  ACC.  A sanction  (Q,A)  Induces 
a relation  S.FSq  over  records  R of  the  database  such  that 

f A if  R satisfies  Q. 

S-FSQ,A<*>  “ |aCC,  otherwise. 

Thus,  a sanction  Induces  a security  specification  which  Indicates  that  only 
the  accesses  in  A may  be  performed  on  the  records  satisfying  Q.  When  R 
does  not  satisfy  Q,  all  accesses  may  be  performed  on  it.  In  this  case  we 
say  that  no  sanctions  of  (Q,A)  are  applicable  to  R.  The  sanction  is  a very 
powerful  type  of  security  specification  since  it  allows  the  full  power  of 
the  query  language  (i.e.,  Q)  to  be  used  to  specify  records  to  be  protected. 


-12- 


Consider  a file  neaed  F end  e set  of  sanction*  where 

s ■ » • • * * (Q*’^*) } * 

A database  cepeblllty  (F,S)  induces.*  security  specification  S.DC..  c over  the 

ffO 

elements  of  R of  F such  that 

S-DCF,S«)  ■ A S-reQ1.A1«> 

In  words,  S.DC-  C(R)  is  the  set  of  all  accesses  granted  for  R by  one  or 

r 9o 

more  file  sanctions  in  S and  not  denied  by  any  sanction  of  S.  Security 
specifications  are  therefore  stored  in  the  DBC  as  database  capabilities. 

The  database  capabilities  specify  exactly  what  access  operations  are 
allowed  on  records.  The  DBC  maintains  database  capabilities  for  each 
active  user. 

For  example,  consider  the  database  capability  {(Q^A^),  (Q2,A2)>. 

Suppose  and  Q2  specify  overlapping  sets  of  records  as  shown  in  Figure  1. 
Then  the  records  in  the  intersection  of  Qj  and  Q2  have  the  access  privileges, 
A^  0 A2  associated  with  them. 

C.  Command  Execution  — The  Processing  of  Access  Requests 

An  access  command  has  the  form  <U, (F,Q) ,a>  or  the  form  <U,(F,R),a>. 

U represents  the  name  of  the  user  issuing  the  command,  a is  an  access, 

(F,Q)  represents  the  response  set  Q(F)  on  which  the  access  is  to  be  per- 
formed, and  (F,R)  represents  a record  R of  F that  is  to  be  used  in  the 
access.  Before  an  access  is  executed,  file  F must  be  protected  from 
unauthorized  access  by  the  user  D,  This  is  accomplished  by  first  employing 
U to  locate  the  appropriate  database  capability  (F,S).  Then  for 
the  command  <U,(F,Q),a>,  the  access  a is  performed  on  each  record  R of  Q(F) 
for  which  S.DCp  g(R)  contains  a.  For  the  command  <U,(F,R),a>  the  access  a 
is  performed  on  R if  a is  in  S.DCp  g(R).  If  any  data  need  be  sent  to  the 
user  as  a result  of  the  access  command,  it  is  sent  to  the  PES  to  be  routed 
to  that  user. 
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Accesses  in  are  Accesses  In  A-  are 

permitted  on  permitted  on 

these  records  these  records 


Only  Accesses  in 
(A^  D A^)  are  per- 


mitted io  the  records 
in  the  shaded  area. 


2.3  The  Need  for  Front-end  S 
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Before  a user  issues  any  acceas  commands  for  a file,  the  database 
capability  specifying  the  user's  access  rights  to  that  file  is  sent  to  the 
DBC  by  the  PES.  An  acceas  command  is  rejected  by  the  DBC  unless  the  appro- 
priate database  capability  is  found.  It  is  the  responsibility  of  the  PES 
to  send  the  correct  database  capabilities  to  the  DBC  and  to  authorize  the 
use  of  access  operations  by  users  by  constructing  appropriate  database 
capabilities.  In  this  way  our  DBC  design  does  not  impose  any  restriction 
on  the  nature  of  the  PES's  security  mechanisms  or  on  the  authorization 
policies  it  supports. 
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3.  THEORY  OF  OPERATION 


A model  which  describes  the  basic  components  of  the  DBC  and  how  they 
Interact  to  realize  the  DBC's  functional  characteristics  Is  now  given. 

In  the  presentation  we  do  not  emphasize  the  intricacies  of  hardware  design. 
Instead,  we  describe  the  operation  of  the  components  at  a conceptual  level. 
In  Part  II  and  Part  III  of  the  paper,  we  shall  show  how  these  components  can 
actually  be  Implemented  with  existing  and  emerging  technology. 

The  theory  of  operation  is  presented  in  two  sections.  In  the  first  sec- 
tion a data  model  Is  developed.  In  the  next  section  we  show  how  the  data 
model  described  above  is  realized  by  the  DBC  with  the  aid  of  functionally 
specialized  components. 


3.1  The  Data  Model 


The  need  for  auxiliary  data  structures  arises  from  the  fact  that  the 
mass  memory  Is  not  fully  associative.  Therefore,  a technique  to  minimize 
mass  memory  accesses  is  required  to  insure  high  performance.  We  shall 
employ  a PCAM-based  mass  memory  to  Implement  the  structure  memory  concept. 

The  mass  memory’s  content  addressability  allows  it  to  contain  only  update 
Invariant  mapping  structures.  The  data  model  will  allow  us  to  determine 
the  nature  of  the  information  to  be  kept  in  the  structure  memory. 

When  a PCAM  partition  Is  used  to  store  records,  record  placement  with- 
in the  partition  does  not  affect  the  system's  performance.  When  a set 
of  records  is  not  placed  in  the  same  partition,  the  system's  performance 
can  be  affected  since  multiple  PCAM  accesses  may  be  required  to  retrieve 
the  records.  To  address  this  problem  a database  is  normally  partitioned  Into 
groups  of  records  whose  records  should  all  be  physically  close  to  each 
other.  The  exact  nature  of  "closeness"  Is  dependent  on  the  properties  of  the 
memory.  For  example,  on  a disk  with  movable  read/write  heads,  records 
could  be  considered  close  if  they  are  stored  In  the  same  cylinder.  This 
seems  reasonable  since  the  cost  of  Initially  accessing  a cylinder  of  the 
disk  is  usually  much  greater  than  the  cost  of  immediately  following  sub- 
sequent accesses  to  the  same  cylinder.  The  underlying  reason  for  this  Is 
the  requirement  for  mechanical  motion  to  access  a new  cylinder. 

In  the  data  model  we  shall  consider  records  to  be  close  when  they  are 
stored  In  the  same  partition  of  the  PCAM  mass  memory.  To  distinguish 


partitions  In  the  mass  memory  PCAM  from  those  in  other  PCAMs,  we  shall 
call  each  of  these  partitions  a minimal  access  unit  (MAU) . 

There  are  many  reasons  for  placing  one  record  close  to  another  record. 

A basic  reason,  related  to  performance,  is  the  likelihood  that  these  records 
will  be  accessed  simultaneously.  There  are  other  reasons  for  grouping 
records.  For  example,  compartmentallsatlon  of  records  for  security  reasons 
is  one.  Precisely  what  features  of  these  records  allow  the  designer  to 
deduce  a particular  record  grouping  does  not  concern  us  at  this  time.  Our 
goal  as  builders  of  generalized  hardware  to  support  a database  system  is 
not  to  choose  a specific  way  to  partition  the  database  but  instead  to 
provide  a general  mechanism  with  which  many  possible  partitionings  may  be 
realized.  Such  a mechanism  will  be  presented  shortly. 

Let  there  be  L MAUs  in  the  mass  memory  and  let  L be  called  the  minimal 
access  unit  count.  All  L MAUs  are  of  fixed  size.  We  denote  the  minimal 
access  unit  size  by  |MAU|.  Associated  with  the  database  DB  is  the  set  of 
records  denoted  by  M(DB)  and  defined  as  {R:R  is  in  DB}. 

If  the  set  M(DB)  is  further  partitioned  into  L subsets  and  each  of 
these  subsets  represents  the  records  which  are  placed  in  a MAU,  then  the 
union  of  the  subsets  is  called  a database  configuration  of  M(DB) . The 
size  of  a record,  i.e.,  the  number  of  bits  needed  to  represent  it  in 
memory  is  denoted  by  |r|.  A datebase  configuration  is  valid  if  each 
subset  X of  M(DB)  satisfies  the  constraint 

( Z |r|)  < |mad| 

RcX 

In  other  words,  a database  configuration  is  valid  if  all  of  the  records  of 
M(DB)  fit  into  MAUs  of  the  mass  memory.  A valid  database  configuration 
results  in  a memory  map  which  describes  how  the  records  are  placed  in  the 
mass  memory , 

Each  MAU  is  represented  by  a unique  name  called  the  minimal  access 
unit  address  (MAU  address),  denoted  by  f where  0 < f < L.  Let  M^  repre- 
sent the  contents  of  the  f-th  MAU. 

The  DB  storage  structure  is  defined  as  the  ordered  sequence 

< Vi \-i>- 

This  sequence  represents  the  distribution  of  records  in  the  MAUs. 

Let  7 be  a file  whose  records  contain  just  m different  keywords  de- 
noted by  K^,  To  keep  track  of  the  MAUs  in  which  records  con- 

taining the  keyword  are  to  be  found,  we  form  the  set  of  0(7,1^)  defined  as 
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{f | R la  in  P and  la  In  R and  ReMf } . 

D(P,Ki)  is  called  a directory  entry  and  each  element  f of  DCF,^)  is  called 
an  index  term.  In  words,  D(F,K^)  is  the  set  of  all  names  of  MAU  which  con- 
tain one  or  more  records  with  the  keyword  . 

The  directory  of  file  F is  defined  as  the  set  DIR(F)  defined  as 

D(F,K2),...,D(F,Ka)}. 


\ 


zj 
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The  directory  of  a file  represents  the  structural  information  needed  to 
access  the  mass  storage.  We  shall  see  how  it  is  used  shortly. 

As  mentioned  earlier,  the  DBC  allows  the  creator  of  a file  to  enhance 
performance  by  allowing  records  of  the  file  to  be  identified  as  a group  (or 
a cluster)  and  by  accessing  such  records  with  minimal  access  delay.  Let  us 
motivate  the  concept  of  clustering  and  the  resulting  performance  Improvement 
by  a simple  example.  Let  a file  F (to  be  placed  in  the  DBC)  have  n records 
of  which  we  choose  four  records  for  our  discussion.  These  four  are  shown  in 
Figure  2a. 

In  figure  2b  we  have  shown  an  arbitrary  placement  of  records  in  the 
two  MAUs  that  have  been  made  available  in  the  database  for  the  file  F.  Now, 
if  a query  for  retrieval  is  received  in  the  form,  "Retrieve  records  which 
satisfy  the  conjunct  (KjAKj)",  then  the  DBC  has  to  make  two  MAU  accesses. 
However,  if  the  records  are  placed  in  the  MAUs  grouped  according  to  the 
occurrence  of  keywords  (K^Kj  and  K3)  in  a record,  then  the  resulting  con- 
figuration will  be  as  shown  in  Figure  2c.  Such  a configuration  will  facili- 
tate the  retrieval  of  all  records  which  satisfy  the  given  query  with  a 
single  access  to  the  mass  memory. 

The  above  discussion  Implies  two  things:  First,  the  creator  of  the 

file  has  an  idea  of  the  type  of  queries  that  will  be  made  on  the  file  . 
Second,  the  system  (DBC)  provides  him  with  a mechanism  of  effectively  con- 
veying that  knowledge  to  the  DBC.  While  we,  as  system  designers,  cannot 
predict  how  much  knowledge  a creator  may  have  of  his  file  usage,  we  must 
ensure  that  he  is  provided  with  an  easy  yet  powerful  mechanism  to  utilize 
that  knowledge  to  his  best  advantage.  The  mechanism  that  we  have  adopted 
and  shall  describe  here  is  capable  of  being  naturally  Integrated  into  the 
query  language  used  by  the  users. 


. 
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A mandatory  clustering  condition  (MCC)  Is  a query  which  is  used  to 
Interrogate  the  MAUa  assigned  to  a file  in  order  to  determine  which  MAUs 
are  eligible  containers  of  a given  record.  [Ue  use  the  word  "Interrogate" 
in  a literal  sense  here;  l.e.,  to  ask  questions  about  the  MAUs.] 

An  optional  clustering  condition  (OCC)  is  a query  which  is  used  to 
determine  which  MAU  among  a set  of  MAUs  will  ultimately  contain  a record. 

A clustering  condition  (CC) , either  MCC  or  OCC,  is  formed  by  considering 
a disjunctive  normal  form  of  a sat  of  clustering  keyword  predicates  (CKPs). 

A clustering  keyword  (CK)  is  a keyword  which  participates  in  the  formation  of  a 
CKP.  The  set  of  all  clustering  keywords  is  called  the  clustering  keyword 
set  (CKS). 

A cluster  c is  defined  as  the  set  of  records  each  of  which  contain 
exactly  the  same  set  of  clustering  keywords.  This  set  is  known  as  the 
basis  of  the  cluster.  Thus, 

Cluster  defined  by  (CKj.CKj**  • • »CKq)  = {r|  CK^O^,. . . ,CKn  e R} 


Notice  that  a cluster  may  be  empty  if  no  record  in  the  file  satisfies  the 
above  condition.  Each  cluster  is  identified  by  a unique  number  within  the 
system.  Such  a number  is  called  the  cluster  identifier.  An  optional  clus- 
tering condition  is  associated  with  a number  called  a cluster  weight  (CW) . 
We  shall  elaborate  on  the  use  of  cluster  weights  shortly. 

We  now  describe  how  the  above  concepts  can  be  used  to  place  a record 
in  the  database. 

A record  for  insertion  is  associated  (by  the  creator)  with  a single 

mandatory  clustering  condition  (MCC)  and  a set  of  optional  clustering 

conditions  (OCC.,  OCC. OCC  }.  In  addition  the  record  contains  a set 

12  q 

of  clustering  keywords  as  part  of  the  record  definition.  Obviously,  in  or- 
der to  produce  meaningful  dusters,  a record's  clustering  keyword  set  must 
satisfy  its  MCC. 


; 
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In  order  to  determine  the  MAU(s)  In  which  the  record  could  be  placed, 
we  need  to  know  the  identities  of  the  clusters  whose  basis  satisfy  the  MCC 
associated  with  the  record  and  the  MAUs  in  which  these  clusters  reside. 
Obviously,  if  we  have  to  access  each  MAU  to  determine  the  cluster  identifiers 
our  purpose  of  performance  enhancement  would  be  defeated.  Thus,  it  is 
essential  that  clustering  information  is  kept  outside  the  MAUs  themselves, 
perhaps,  in  the  structure  memory.  Now  having  determined  the  set  of  MAUs 
where  the  record  can  be  placed,  we  use  the  optional  clustering  conditions 
to  choose  one  of  the  many  MAUs  determined  earlier.  This  is  done  as  follows. 
Let  CW^  be  the  cluster  weight  of  the  optional  clustering  condition  OCC^. 

Then  define,  for  MAU  f, 

q fCW  , if  MAU,  contains  a record  which  satisfies  OCC,. 

OW(Mf)  - l 1 1 J 1 

i-1  L 0,  otherwise. 

The  record  is  then  placed  in  the  f-th  MAU  such  that, 

Vk  (OW(Mf ) > 0W(Mfc)) 


$ ^ 


The  meaning  of  the  cluster  weights  associated  with  each  of  the  OCCs  is  now 
clear.  It  is  to  incorporate  relative  Importance  of  the  OCCs  with  respect 
to  one  another.  It  may  be  desirable  to  set  a threshold  OT  so  that  a 
record  is  assigned  to  the  f-th  MAU  only  if  OW(M^)  _>  OT. 

It  should  be  noted  that  the  role  played  by  the  clustering  keywords  of 
a record  is  minimal  in  the  process  of  insertion  of  that  record.  However, 
it  plays  an  important  role  in  the  insertion  of  subsequent  records. 

We  now  show  how  the  DBC  can  group  records  for  security  purposes.  Cer- 
tain attributes  of  a file  may  be  designated  as  security  attributes  by  the 
creator  of  the  file.  A security  keyword  is  a keyword  whose  attribute  is  a 
security  attribute.  Each  record  belonging  to  such  a file  with  security 
attributes  contains  a set  of  security  keywords  (possibly  empty) . This 
set  defines  a security  atom.  A record  is  said  to  belong  to  a security 
atom  if  and  only  if  its  security  keywords  define  the  security  atom  in 
question.  The  concept  of  security  atoms  is  due  to  [14].  In  figures  3a, 

3b  and  3c,  we  have  illustrated  this  concept  by  means  of  an  example  [19]. 


(ft,  ft} 

(K2,  K3,  ft,  ft} 
(K2,  ft,  ft) 

(n,  ft;* ft} 

{ft} 

{ft,  ft,  ft} 


{Kl,  ft,  K5,  K6) 

(ft,  ft,  ft,  ft,  K6} 
{Kl,  ft,  ft,  ft) 

{ft,  ft} 

{Kl,  K2,  ft,  K5,  K6} 
{X2,  ft) 


(Kl,  ft,  K6) 
(K2,  ft} 

{ft,  K5,  K6} 
{Kl,  K2,  ft) 


Figure  3a.  Record*  (only  keywords  In  the  records  are  shown) 
to  be  Partitioned  Into  Security  Atoms  • Keywords 
ft,K5,K6  are  security  keywords. 


Security  Atom  0 


{ft,  K6} 

{Kl,  K3,  ft,  K6) 
{Kl,  ft,  K6} 


Security  Atom  1 


{Kl,  K2,  ft} 
{K2,  ft,  ft} 


Security  Atom  2 


{K2,  ft,  ft,  K6} 
{Kl,  ft,  ft,  K6) 
(ft,  ft,  ft} 


Security  Atom  3 


{K2,  ft} 

(K3,  ft} 

{Kl,  ft,  ft} 
{«} 


Security  Atom  4 


{K2,  K3,  ft,  ft,  ft} 
{ft.  ft,  K6} 

{Kl,  K2,  ft.  ft,  ft} 


Security  atoms  seta 

Security  atom  0 

{ft,  K6} 

and  their  corresponding 

Security  atom  1 

{ft} 

security  keywords: 

Security  atom  2 

{ft,  K6) 

■ • 

Security  atom  3 

{ft) 

Security  atom  4 

{ft,  ft,  ft} 

Figure  3b.  The  Security  Ate 

ms  of  the  Records  of  Figure  3a. 

Security  Keyword  Conjunct 
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0.  1,  A 
2,  3.  A 
0,  2,  A 
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0,  A 
2,  A _ 
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Figure  3c.  Security  Atoms  Satisfying  Boolean  Conjuncts  of 
Security  Keywords 


The  concept  of  security  etoa  cen  be  used  to  Implement  a penetration-proof  pro- 
tection mechanism  when  file  sanctions  ere  specified  In  terms  of  security  key- 
words only.  This  Is  so  because  of  two  reasons.  First,  security  atoms  are 
disjoint  (1.  e.  a record  will  belong  to  exactly  one  and  only  one  atom)  and 
second,  a file  sanction  made  up  of  security  keywords  will  apply  to  either 
all  records  of  an  atom  or  none  at  all.  Thus,  It  Is  easy  to  create  a list 
of  security  atom  Identifiers  and  the  applicable  file  sanctions  (or,  better 
still,  the  corresponding  access  privilege  sets).  Whenever  an  access  la  re- 
quested, the  security  atom(s)  described  by  the  keywords  in  the  query  or  record 
are  looked  up  In  the  list.  If  the  access  Is  permitted  by  the  access  privilege 
set  of  the  etom(s),  then  the  request  is  accepted;  otherwise  It  Is  rejected. 

It  may  be  argued  that  a creator  may  wish  to  protect  his  records  at  the  sub- 
atomic level  or  In  a manner  which  effects  portions  of  different  atoms.  In 
such  cases,  full  search  of  the  file  sanctions  is  necessary  to  determine  which 
of  the  file  sanctions  are  applicable  to  an  access  request.  Thus,  the  data 
model  supports  two  protection  mechanisms.  The  first  Is  geared  towards  reducing 
security  costs  to  a minimum,  while  the  other  alms  at  providing  maximum  flex- 
ibility to  the  user.  For  the  sake  of  convenience,  we  shall  call  the  protec- 
tion mechanism  based  on  security  atoms  Type  A protection  mechanism.  The  other 
protection  mechanism  based  on  full  file  sanctions  search  will  be  called  Type  B 
protection  mechanism. 

From  the  above  discussions,  we  conclude  that  the  data  model  specifies 
three  steps  by  which  a record  may  be  evaluated  for  placement.  First,  the  HAU 
where  the  record  Is  to  be  placed,  is  determined  by  the  clustering  conditions 
specified  by  the  creator  for  the  record;  second,  the  cluster  to  which  it  be- 
longs is  determined  by  the  clustering  keywords  In  the  record;  and  finally  the 
security  atom  (If  the  creator  has  chosen  to  specify  file  sanction  In  terms  of 
security  keywords)  to  which  a record  belongs  Is  determined  by  the  set  of 
security  keywords  appearing  in  the  record. 

3.2  The  Basic  DBC  Operations 

The  basic  DBC  operations  are  security  enforcement,  record  insertion 
record  retrieval  and  record  deletion.  We  first  give  a brief  description 
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of  these  operations  and  ralate  then  to  their  supporting  components.  Then 
we  show  in  some  depth  the  data  structures  and  algorithms  Involved  in  the 
operations . 

3.2.1  The  Pole  of  Security  Enforcement 

The  security  filter  processor  (SFP)  and  the  database  command  and  control 
processor  (DBCCP)  Jointly  maintain  the  database  capabilities  for  the  active 
users  of  the  system.  In  order  for  them  to  correctly  enforce  a security  policy, 
the  proper  database  capabilities  must  be  provided  by  the  PES.  A table  is 
kept  for  each  user  with  the  database  capabilities  for  each  active  file. 

Let  each  table  entry  have  the  form: 


(F »{ (Q^ » (Q2 » • • • » (QjtAj) } ) 

where  the  set  of  couples  is  a database  capability. 

Commands  of  the  form 

(U,  (F,Q) , a)  and 
(0,  (F,R) , a) 

pass  through  the  SFP  or  the  DBCCP  depending  on  the  type  of  protection  mech- 
anism chosen  by  the  user.  If  the  creator  has  chosen  Type  A protection 
mechanism,  the  DBCCP  converts  the  file  sanctions/ into  a list  called  the 
ato"»*c  access  privilege  list  (AAPL) . The  AAPL  has  the  form 

(U,  F,  { (SAN^,  APD^,  (SAN2,APD2) (SANp,APDp) }) 

where  SAN^  is  the  name  of  the  i-th  security  atom  of  the  file  F and  APD^  is 
the  access  privilege  set  associated  with  SAN^  for  the  user  U.  In  forming 
the  AAPL,  the  DBCCP  makes  use  of  all  the  DBC  components  except  the  mass 
memory  and  the  SFP.  This  results  in  minimal  delay  in  creating  the  list.  If 
the  creator  has  chosen  Type  B protection  mechanism  , the  SFP  takes  over  the 
maintenance  and  usage  of  the  file  sanctions. 

Records  are  sent  Into  the  DBC  by  way  of  commands  of  the  form 

(U,  (F,R,MCC,{0CC1}),  "insert") 

When  such  a command  is  received  by  the  DBCCP,  the  record  to  be  inserted  is 
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checked  for  ■•curity  clearance  with  the  aid  of  the  AAPL  (Type  A protec- 
tion mechanism)  or  the  file  sanctions.  If  the  result  of  the  check  indi- 
cates that  the  record  nay  be  inserted,  then  the  DBCCP  proceeds  with  the 
actual  insertion  process. 

When  a command  (U,  (F,Q),  "retrieve")  is  received  by  the  DBCCP, 
the  query  Q undergoes  a similar  check.  If  the  check  is  successful,  the 
mass  neaory  is  instructed  to  retrieve  the  relevant  records  which  form  the 
response  set  Q(F).  Each  record  in  the  set  Q(F)  is  tagged  with  the  user 
identification  and  file  name,  (F,U,R).  If  the  user  has  specified  Type  B 
protection  mechanism,  then  the  retrieved  records  are  subject  to  a security 
check  by  the  SFP  before  the  records  are  passed  on  to  the  PES.  This  is 
because  the  records  may  contain  keywords  (in  addition  to  and  including 
those  that  are  required  to  satisfy  the  query  Q)  which  satisfy  the  query 
parts  of  file  sanctions.  The  access  privilege  sets  of  such  file 
sanctions  then  become  applicable  to  the  records.  As  a result  some  of  the 
retrieved  records  may  not  be  passed  onto  the  user.  Such  a drop  in  pre- 
cision is  part  of  the  price  a user  pays  for  the  wide  latitude  the  system 
provides  in  specifying  security  Information. 

To  execute  the  command  (U,(F,Q),  ’delete’)  the  query  Q is  put  through 
a similar  check.  If  the  access  "delete"  is  not  granted,  the  command  is 
rejected.  If  the  access  is  granted,  the  mass  memory  is  Instructed  to 
proceed  with  the  access.  In  case  of  type  B protection  mechanism,  as  each 
record  is  accessed,  it  is  sent  to  the  SFP  for  a check  against  the  set  of 
file  sanctions.  The  rationale  for  this  check  is  the  same  as  the  one 
given  for  the  "retrieve"  command.  If  the  check  is  successful,  the  mass 
memory  proceeds  to  delete  the  record  from  the  database;  otherwise,  the 
record  is  not  deleted. 
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3.2.2.  Name-Mapping  and  the  System  Components 


The  retrieve  and  delete  commands  both  employ  Q as  a parameter.  The 
subsequent  processing  of  Q that  is  necessary  to  execute  these  commands 
had  the  greatest  effect  in  determining  the  architectural  components  of  the 
system.  We  shall  now  provide  an  Introduction  to  these  components. 
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A query  Q in  these  commands  is  in  a disjunctive  normal  form  as  follows: 


(T*A...AT1  )V...V(T?A...AT®  ) 

1 nl  1 °m 

where  T*  are  keyword  predicates.  The  1-th  conjunct  of  this  query  is  de- 
J i 

noted  by  Q . To  form  the  response  set  Q(F)  the  mass  memory  must  be  given 
two  arguments:  a query  Q and  a MAU  address  f.  Given  these  arguments  the 

mass  memory  will  locate  all  records  in  that  satisfy  the  query  Q.  We 
had  earlier  seen  that  each  of  the  index  terms  in  the  directory  entry  of  a 
keyword  defined  an  MAU  address  f.  In  the  light  of  later  discussion  (on 
clustering  techniques  and  the  security  atom  concept)  it  became  apparent 
that  the  index  terms  must  carry  information  not  only  about  MAU  addresses 
but  also  about  cluster  identifiers  and  security  atoms.  Thus  an  augmented 
directory  entry  for  a keyword  K of  file  f is  defined  as 


D(K,F)  = { (f,c,s)  | 3 R5ReMf,  R e cluster  c,  R e security  atom  s and  K e R} 


The  triple  (f,c,s)  will  be  called  an  augmented  index  term.  In  cases 
where  the  user  has  chosen  Type  B protection  mechanism  the  security  atom 
concept  is  not  applicable  and  the  third  member  of  an  augmented  index 
term  is  null.  In  future  discussions,  by  index  terms  we  will  always  mean 
augmented  index  terms.  To  obtain  MAUs  for  a conjunct  Q*,  all  index  terms 
for  keywords  satisfying  each  T*  of  the  conjunct  must  be  found.  Once  found, 
a set  intersection  operation  is  performed  over  the  index  terms.  The 
resulting  index  terms  are  those  whose  keywords  will  make  the  conjunct  Q* 
true.  The  MAU  address  is  derived  from  these  index  terms  are  then  used  as 
arguments  to  retrieve  records  from  the  mass  memory. 

An  algorithm  which  forms  the  response  set  Q(F)  is  given  in  Figure  4. 

In  the  algorithm,  we  have  temporarily  Ignored  security  considerations  for 
the  sake  of  clarity.  In  line  5 of  this  algorithm,  the  index  terms  are 
fetched  from  all  directory  entries  (D(K,F))  whose  keyword  K satisfied  T* 
and  are  placed  in  a set  u(j).  In  line  8 MAU  addresses  are  extracted. 

In  lines  3-6,  one  set  u(j)  is  formed  for  each  keyword  predicate 


For  J “ 1,2 n^  do 

b«gln 

w(J)  = {(f,c,s)|  K satisfies  T*  and  (f,c,s)  c D(F,K) } 
and  ^ 

0(1)  H k2l  u(k) 
e’(i)  = {f|(f,c,s)ce<i)> 
end 

E s { (Qk,f)  I f e 0'(k)} 

Q(F)  s U {K|ReM.  and  R satlaflas  Qk} 

(Qk.f)eE  f 

12.  end: 

Figure  4.  A Naas  Mapping  Algorithm 

D 

U 

0 

0 

0 

0 


Tj  in  Q1.  Then  In  line  7 these  sets  are  Intersected  to  give  the  set  9(1). 
In  line  8,  the  MAUs  to  be  searched  are  extracted  from  the  Index  terms 
obtained  in  line  7.  (Line  7 carries  out  an  intersection  operation  since 
the  keyword  predicates  of  Q1  are  ANDed  together.)  In  lines  1-9,  a set  9(1) 
is  formed  for  each  conjunct  Q*  and  finally  in  lines  10-11  the  records  are 
retrieved  from  the  mass  memory.  In  line  10  the  response  set  is  defined  as 
the  union  of  the  following  sets 

{ R | R e and  R satisfies  Q*  }. 

This  algorithm  also  shows  how  the  data  structures  defined  In  the  data  model 
are  used  for  name-mapping.  The  content  addressability  employed  by  the  DBC 
will,  in  fact,  allow  the  actual  realization  of  the  data  structures  to  be 
just  as  simple  as  those  illustrated  here. 

This  algorithm  shows  us  what  the  structure  memory  must  do.  The 
structure  memory  must  store  directory  entries  and  be  able  to  accept  a key- 
word predicate  T and  retrieve  all  index  terms  for  all  keywords  which 
satisfy  T^  (as  in  line  5).  Clearly,  the  structure  memory  will  also  have  to 
be  able  to  add,  delete  and  modify  directory  entries  as  well.  It  also  shows 
us  the  nature  of  the  structure  memory  information  processing,  namely,  set 
manipulation  (line  7) . These  observations  help  us  outline  the  architecture 
of  the  DBC.  The  DBC  contains  at  least  six  functionally  specialized  com- 
ponents: the  database  cosnand  and  control  processor  (DBCCP),  the  security 

filter  processor  (SFP) , the  mass  memory  (MM) , the  structure  memory  (SM) , 
the  structure  memory  information  processor  (SMIP) , and  the  index  transla- 
tion unit  (IXU).  The  DBCCP  is  responsible  for  translating  DBMS  commands 
into  lower  level  commands  for  the  mass  memory  and  coordinating  the  actions 
of  the  other  components.  The  MM  contains  DB,  the  SM*  stores  the  directory 
entries  and  the  SMIP  is  a set  operation  processor.  The  index  translation 
unit  is  responsible  for  extracting  the  MAU  addresses  from  the  augmented 
index  terms.  The  organisation  of  these  components  to  a first  order  detail 
is  shown  in  Figure  5.  We  shall  see  latsr  in  Part  II  that  a seventh  com- 
ponent , namely  the  keyword  transformation  unit  (KXU) , is  needed  from  the 
point  of  view  of  an  efficient  physical  realisation  of  the  DBC. 


3.2.3 


The  theory  of  operation  continues  with  an  exposition  of  the  operating 
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Figure  5.  Architecture  of  DBC 
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principles  of  the  structure  memory,  structure  memory  information  processor, 
and  the  mass  memory.  The  carefully  tailored  functional  characteristics  of 
these  components  allow  them  to  readily  carry  out  nunerous  steps  of  DBS 
algorithms  to  be  given.  The  description  of  the  components  that  follows  is 
only  conceptual  in  nature,  the  actual  hardware  organization  used  to 
realize  them  is  given  in  Part  11  and  Part  111  of  the  paper. 


A.  Structure  Memory 

The  SM  is  the  repository  of  the  directories  of  the  files  in  the  DB. 

Each  index  term  (f,c,s)  of  D(F ,K)  is  stored  in  the  SM  as  the  tuple  (F,K,f,c,s). 
The  contents  of  the  SM  may  therefore  be  viewed  as  a set,  known  as  structural 
memory  basis  SMB,  of  such  tuples  defining  the  directories  of  all  files. 

The  SM  retrieve  command  has  the  form  SM<retrieve, (F,T)>  where  F is  a file 
name  and  T is  a keyword  predicate.  The  command  is  carried  out  by  constructing 
a set  containing  all  index  terms  of  each  directory  entry  D(F,K)  whose  keyword 
K satisfies  T.  Formally,  the^M  executes  the  command  SM<retrleve, (F,T)>  by 
outputting  the  set 


{(f,c,s)  | (F,K,f ,c,s)  e SMB  and  K satisfies  T } 


The  insert  command  has  the  form  SM<insert,(F,K,f ,c,s)>  and  is  executed 
by  adding  (f,c,s)  to  the  set  D(F,K).  In  other  words,  the  insert  coranand  is 
executed  by  replacing  SMB  with  SMB  U (F,K,f,c,s). 

*N 

The  delete  command  has  the  form  SM<delete,(F,K,f ,c,s)>  and  is  executed 
by  removing  (f,c,s)  from  D(F,K) . Formally,  the  deletion  command  is  executed 
by  replacing  SMB  with  SMB  - (F,K,f,c,s). 

To  model  its  operations  the  SM  can  be  viewed  as  a PCAM  with  M 
content-addressable  blocks.  The  SM  partitions  the  set  SMB  into  N subsets, 
designated  SMB^,  0 < 1 < N,  where  M < M.  Each  subset  is  stored  in  one  or  more 
blocks  of  the  PCAM. 

The  retrieve  coonand  is  executed  by  first  applying  to  T a hash 
function  which  maps  it  into  an  integer  j where  0 s j < N.  Then  the  set 
SMBj  is  searched  by  accessing  the  appropriate  block(s)  of  the  PCAM  to 
locate  and  retrieve  the  tuples  (F,K.,f,c,s)  whose  keyword  K satisfies  T. 

Insert  and  delete  commands  are  executed  by  applying  to  K a hash  function 
which  maps  it  into  an  integer  j.  The  tuple  (F,K,f,c,s)  is  then  added  to  or 
removed  from  the  subset  SMB^  by  accessing  the  appropriate  block  of  the  PCAM. 
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The  nature  of  the  hash  function  will  strongly  Influence  the  kinds  of 
keyword  predicates  that  aay  be  used  by  the  system.  This  Issue  along  with 
a description  of  how  the  sets  SMB^  are  stored  In  the  PCAM  and  how 
SM  and  Its  PCAM  Is  realised  are  addressed  in  Part  II  of  this  paper. 

Consideration  Is  now  given  to  the  fact  that  the  SM  Is  a two-level 
system  containing  a directory  entry  storage  and  a look-aside  buffer.  We 
now  extend  the  aforementioned  operations  to  the  two-level  SM.  Let  the 
directory  entry  storage  be  represented  by  the  set  SMB  defined  above.  The 
time  required  to  update  this  set  (i.e.,  add  or  delete  an  element)  Is 
fairly  long  compared  to  the  time  required  to  update,  say,  a fast  access 
semiconductor  RAM  memory.  The  look-aside  buffer  allows  SM  update  opera- 
tions to  appear  as  though  they  were  executed  Immediately. 

The  look-aside  buffer  may  be  conceptually  represented  by  an  ordered 
set  LKA  of  SM  update  commands: 


command^ , command ^ command^ 


where  conmand^  preceeded  command in  time.  The  look-aside  buffer  has 
two  functions:  It  acts  as  a command  queue  for  the  SM  and  it  contains  the 

information  which  allows  the  SM  to  appear  updated.  The  look-aside  buffer 
would  be  realized  with  high-speed  random  access  memory  and  so  Its  access 
time  would  be  much  less  than  that  of  the  directory  entry  storage. 

Whenever  an  update  command  Is  received  by  the  SM  it  is  placed  in  LKA. 
If  an  insert  (delete)  command  negates  the  effect  of  a previous  delete 
(insert)  command  then  the  Insert  (delete)  command  is  not  added  to  LKA. 
and  the  negated  delete  (Insert)  command  is  removed  from  LKA. 

To  execute  a retrieval  command  the  two  level  SM  first  examines 
LKA  for  commands  which  add  index  terms  (f,c,s)  to  directory  entries 
D(F,K)  whose  keyword  K satisfies  T.  All  index  terms  so  found  are  output. 
Then  the  set  SMB  is  searched  for  additional  index  terms.  When  an  Index 
term  (f,c,a)  of  a directory  entry  D(F,K)  whose  K satisfies  T is  retrieved 
from  SMB  it  is  checked  in  the  following  way:  If  there  is  a command  in 

LKA  to  delete  (f,c,s)  then  that  index  term  is  not  output  from  SM. 

B.  Structure  Memory  Information  Processor 


The  SMIP  is  a processor  for  set  manipulation.  Set  manipulation 


operation  are  performed  by  maintaining  an  Intermediate  set  In  the  SMTP 
while  the  argument  aet8  which  modify  it  are  paaaed  through  the  SMIP. 

The  SMIP' a intermediate  set  is  designated  SW  and  consists  of  couples 
(m,d)  called  SMIP  data  units.  The  first  part  m of  the  couple  is  called 
the  key  and  the  second  part  d is  called  the  data.  Operations  are  per- 
formed on  SW  by  identifying  a SMIP  data  unit  and  by  performing  an  oper- 
ation on  it.  There  are  two  kinds  of  SMIP  commands.  The  first  kind  of 
SMIP  conaand  is  represented  by  SMIP<m,g>  where  m is  a key  and  g is  a 
manipulation  function.  The  manipulation  function  can  do  two  things: 
first,  it  can  specify  how  the  data  part  of  a SMIP  data  unit  (m,d)  with 
key  m should  be  modified;  and  second,  it  can  specify  what  should  be  done 
if  no  SMIP  data  unit  with  key  m is  in  SW.  When  no  SMIP  data  unit  with 
key  m is  found  and  no  action  is  specified  by  g then  SMIP  takes  no  action. 
The  second  kind  of  SMIP  comnand  has  of  the  form  SMIP<g>  where  g specifies 
an  action  that  is  to  occur. 

To  illustrate  the  set  manipulation  functions,  let  us  show  how  the 
SMIP  can  be  used  to  perform  an  N-set  intersection.  Let  X^  represent  one 
of  these  N sets  and  let  x^  represent  an  element  of  X^.  The  algorithm 
which  performs  the  Intersection  is  shown  in  Figure  6. 

In  lines  1-4  of  the  algorithm  a SMIP  data  unit  of  the  form  (x. , 1)  is 

**  » 

created  for  each  element  of  X^.  In  steps  5-11  each  element  of  the  sets 

X_,  X-,...,  X is  examined  and  whenever  a matching  SMIP  data  unit  is 
t.  j n 

found  its  data  part  is  incremented  by  1.  When  these  steps  are  com- 
pleted, SW  contains  SMIP  data  units  which  indicate  in  how  many  of  the 
sets  X^  each  element  of  X^  appears.  Those  elements  appearing  in  all  sets 
make  up  the  set  X^,„..,Xn.  In  line  12  all  such  elements  are  retrieved 
from  the  SMIP. 

The  SMTP  is  also  realized  with  a PCAM.  To  model  the  operation  of  the 
SMIP,  a PCAM  with  M content-addressable  blocks  is  used.  The  SMIP  parti- 
tions the  set  SW  into  N subsets  designated  SWB^  where  N < M.  Each  subset 
is  stored  in  one  or  more  blocks  of  the  PCAM. 

The  command  SMLR:m,g>  is  executed  by  applying  to  m a hash  function 
which  maps  it  into  an  integer  j where  0 < j < N.  Then  SWB^  is  searched  for 
a SMIP  data  unit  with  the  key  m.  If  it  is  found,  g is  applied  to  its 
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0.  begin 

1.  For  each  element  of  X^  do 

2.  begin 

3.  execute  the  command  SMIP<Xj^,  "create  (xlitl)"> 

4.  end 

5.  For  J ■ 2,  3,  ....  N do 

6.  begin 

7.  For  each  element  x^  of  do 

8.  begin 

9*  execute  SMIP<Xjlt  "replace  (x^.d)  with  (xJ± 

10.  end 

11.  end 

12.  Execute  the  SMIP* "retrieve  the  key  m from  all  (m,d)  where 

13.  end 

Figure  6.  An  N-set  Intersection  Algorithm  Using  the  SMTP 
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data  part.  If  no  SMIF  data  unit  is  found,  then  any  other  action  that  g 
specified  is  carried  out  on  SWB^ . The  command  SMIP<g>  is  executed  by 
ordering  each  block  of  the  SMIP  PCAM  to  perform  the  operation  specified 
by  g.  In  Part  II  of  this  paper  we  shall  go  into  detail  about  the  con- 
struction of  the  SMIP  and  its  PCAM. 


C . Mass  Memory 

The  MM  is  the  repository  of  the  database  itself.  To  retrieve  data 
from  the  DB,  queries  and  MAU  addresses  in  which  data  reside  must  be 
given  to  the  MM.  (These  MAU  addresses  are  normally  provided  by  the  SM, 
SMIP  and  IXU  after  processing  a given  query  Q.) 

Mass  memory  commands  have  two  forms.  The  first  form  MM<a,U, (F,Q) ,f> 
specifies  an  access  type  a,  a user  U,  a query  Q,  and  a MAU  address  f.  It 
is  executed  by  performing  access  a on  the  records  in  M^  satisfying  Q. 
While  executing  this  command  the  MH  may  use  the  SFP  to  validate  an  access. 
The  second  form,  IMca  ,U,R,f>*  is  used  to  Insert  a record  R into 


3.2.4  The  Execution  of  Record  Operations 

In  this  section  we  discuss  the  record  insertion,  deletion  and  re- 
trieval operations.  We  Indicate  the  basic  requirements  of  these  opera- 
tions and  give  algorithms  to  show  how  these  operations  are  carried  out  by 
the  DBC's  components. 

Record  retrieval  is  simply  the  formation  of  the  response  set  Q(F); 
an  abstract  algorithm  to  do  this  has  already  been  given  in  Figure  4.  An 
equivalent  algorithm,  using  explicit  SM,  SMIP  and  MM  commands  is  given  in 
Figure  7.  This  algorithm  executes  the  DBC  retrieval  command  <U,(F,Q), 
"retrleve">  as  discussed  in  Section  2.2,  where  Q has  the  form 
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Q(F)  5 { > 

Z = { } 

For  i - 1,  2,  . . . , a do 
begin 

= SM<F,Tj> ; SMIP<"reset"> 

For  every  element  (f ,c,s)  In  do 
begin 

SMIP<"create  ((f ,c,s) ,!)"> 


For  J - 2,  3 Oj,  do 

begin 

Oj  = SM<F,tJ> 


For  every  element  (f ,c,s)  In  do 
begin 

SMIP<(f) , "replace  ((f,c,s),x)  by  ((f,c,s),j)  If  x - J-l"> 


a)  = SM1P< "output  key  (f,c,s)  for  all  elements  ((f,c,s),d),  where  d«n 
-i  i 


I = 1 U {(Q1,  (f ,c,s)  I <f,c,s)  e 0)} 
end 

E’  = ( ) 

Jr 

For  every  element  (Q  , (f,c,s))  In  £ do 
begin 

IXU<Extract  (f)  from  (f,c,s)> 

E'  = E*  U ((Qk,  (f)) 

end 

V 

For  every  element  (Q  , <f ) ) in  E'  do 
begin 

Q(F)  = Q(F)  U MM<retrieve,  U,  (F,Qk),  (f ) > 


Figure  7.  An  Algorithm  to  Form  Q(F) 
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In  lines  5-21  of  the  algorithm,  one  conjunct  Q1  of  the  query  is 

processed.  In  lines  6-10  all  index  terms  for  T*  are  fetched  from  the  SM 

and  data  elements  are  loaded  into  the  SMIP.  The  command  SMIP<"reset"> 

clears  the  SMIP  storage  areas.  In  lines  11-18,  all  index  terms  for  each 

Tj  are  fetched  from  the  SM  and  the  elements  of  all  of  these  sets  are 

Intersected  in  the  SMIP.  The  SMIP  command  in  line  16  insures  that  multiple 

occurrences  of  the  same  index  term  for  one  keyword  predicate  will  not 

affect  the  intersection  process  by  incrementing  the  occurrence  count  x of 

a SMIP  data  unit  at  most  once  for  each  T*  In  line  19  all  index  terms 

3 • 

in  the  intersection  are  retrieved  and  in  lines  22-27  they  are  used  to  build 
a set  containing  the  information  needed  to  issue  MM  commands.  In  lines 
3-21  all  conjuncts  are  processed  and  all  of  the  information  needed  to 
access  the  MM  is  then  placed  in  £.  Finally,  in  lines  28-32  the  MM  com- 
mands are  formed  and  executed  resulting  in  actual  retrieval  of  the 
records. 

The  SM,  SMIP,  IXU,  MM  and  DBCCP  would  operate  in  parallel  to  execute 
the  above  algorithm.  The  DBCCP  would  execute  the  control  statements  of 
the  algorithm  while  the  other  components  are  executing  conmands.  To  do 
this  the  DBCCP  would  order  a component  to  execute  a command  and  then  con- 
tinue to  process  the  algorithm  as  far  as  possible.  The  SM,  SMIP  and  IXU 
operate  in  a tightly  coupled  parallel  fashion.  Whenever  an  element  is 
output  from  the  SM  (line  6 or  13)  it  is  sent  directly  to  the  SMIP  where 

it  is  processed  (line  9 or  line  16).  Whenever  an  element  is  output  from 

SMIP  (as  in  line  19)  it  is  directly  sent  to  the  IXU  for  translation. 

[This  parallelism  is  not  evident  in  the  algorithm].  The  MM  can  also  oper- 
ate in  parallel  with  other  elements.  The  controller  can  create  MM  comnands 
on-the-fly  when  it  executes  line  26  and  send  them  directly  to  the  MM.  This 

technique  would  be  equivalent  to  the  operations  specified  in  lines  26  and 

28-32.  The  MM  could  also  execute  conmands  for  other  queries  while  this 
algorithm  was  being  executed.  The  SFP  would,  of  course,  also  be  executed 
in  a parallel  with  the  algorithm  to  check  (if  necessary)  all  records 
leaving  the  DBC. 

Record  insertion  requires  three  major  operations.  First,  the  MAU 
which  is  to  contain  the  record  is  chosen.  Then,  the  information  in  the 
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structure  memory  is  updated.  Finally,  the  record  is  placed  in  the  mass 
memory.  The  last  two  operations  are  carried  out  by  the  structure  memory 
and  the  mass  memory,  and  we  shall  discuss  them  later  in  the  paper. 

The  MAU  selection  operation  is  a two  step  process.  First,  all  MAUs 
satisfying  the  mandatory  cluster  condition  associated  with  the  record  are 
found;  second,  the  MAUs  are  examined  to  find  the  one  with  the  greatest 
optional  clustering  weight.  Since  both  the  MCC  and  the  set  of  OCCs  are 
queries  made  of  clustering  keyword  predicates,  the  SM,  SMIP  and  I XU  can 
once  again  be  used  in  parallel  to  extract  the  required  MAU  address(es). 

An  algorithm  which  does  this  is  given  in  Figure  8.  In  this  algorithm 
the  mandatory  clustering  condition  for  the  record  to  be  inserted  is' 
represented  by 

MCC  = MCQX  V MCQ2  V...V  MCQ^ 

where  MCQ^^  is  a conjunct  of  clustering  keyword  predicates  of  the  form 

(CKP*  A CKPJ  A... A CKP1) 

12  p 

There  are  n optional  clustering  conditions  (OCC^,  OCC2»  ...  OCC^)  each  of 
which  is  in  the  disjunctive  normal  form.  Each  OCC^  is  associated  with  a 
cluster  weight  cw^. 

In  lines  1-5,  the  set  of  index  terms  (f,c,s)  whose  cluster  component 
identifies  the  cluster  satisfying  the  conjuncts  of  the  MCC  are  determined. 
In  line  6,  all  of  these  index  terms  are  Identified  in  one  set  u>,  and  in 
line  7 , the  MAUs  are  extracted  and  placed  in  u'  . The  SM  and  SMIP  are 
used  in  line  4 to  obtain  the  index  terms  while  the  IXU  is  used  in  line  7 
to  extract  the  MAU.  In  lines  9 thru  12,  the  OCCs  are  processed  in  a 
similar  fashion  by  the  SM,  SMIP,  and  the  IXU  to  produce  the  set  9.  In  line 
13  and  14  the  total  cluster  weights  associated  with  each  f in  u'  are  calcu- 
lated. Finally  in  line  15  the  MAU  with  the  largest  weight  is  chosen. 

Like  the  record  retrieval  algorithm,  this  algorithm  requires  three 
basic  operations  - directory  entry  retrieval,  set  intersection  and  index 
translation. 

The  process  of  physically  adding  a record  or  removing  a record  from 
the  mass  memory  is  quite  simple  due  to  its  content  addressability.  The 
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{ (f,c,s)  ( duster  c c and  If  Re  c «*R  satisfies  OCCj} 
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for  every  f In  u'  do 
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Figure  8.  An  Algorithm  to  Choose  an  MAU  for  a Record 
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deletion  process  may,  however,  also  require  modification  of  the  structure 
Information  in  the  SH.  This  will  occur  whenever  the  record  deleted  is  the 
last  record  in  a cluster  that  contains  certain  keywords.  When  a record  R 
in  cluster  c,  security  atom  s,  and  MAU  f is  removed  from  Mf ,the  index  term 
(f,c,s)  must  be  removed  from  each  directory  entry  D(F,K)  in  SM  where  K is 
a keyword  appearing  in  R but  in  no  other  record  in  the  cluster  c.  To 
handle  this  operation  we  provide  the  mass  memory  with  the  capability  of 
determining  whether  or  not  a keyword  appears  in  more  than  one  record  of  a 
cluster.  Full  details  of  the  insertion  and  deletion  algorithms  are  discussed 
in  Part  II  and  Part  III  of  this  paper. 
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4.  THE  TECHNOLOGY  OF  THE  DBC 


The  DBCCP,  SFP , end  the  IXU  are  conventional  processors  that  would  be 
specially  microprogr aimed  for  their  task.  The  SMIP  and  the  MM  employ  cur- 
rently available  technology  In  a new  way.  The  SH  can  also  be  built  with 
available  technology  but  the  most  powerful  SM  organizations  employ  new  tech- 


nology that  will  become  available  In  the  near  future. 


The  SM  Is  most  dependent  on  technological  developments.  Its  PCAM 
could  be  built  today  by  using  a fixed-head  disk  as  the  storage  medium. 
Each  block  of  the  PCAM  would  be  stored  on  one  or  more  tracks  of  the  disk. 
The  memory  would  be  accessed  by  reading  and  searching  the  track(s)  repre- 
senting a block.  This  organization  would  have  two  limitations:  First, 

the  block  access  time  would  be  relatively  slow  (5ms  or  greater);  this  is 


a potential  system  bottleneck.  Second,  the  PCAM  would  consist  of  many  rela- 
tively small  blocks  and  so  only  equality  predicates  could  be  readily 
handled  by  the  SM.  This  is  because  the  small  block  size  Implies  small  hash 
table  buckets  which,  in  turn.  Implies  that  the  hash  function  must  be  used 
for  exact-match  searches.  This  is  because  inequality  searches  would  cause 
access  to  large  number  of  small  blocks. 

The  rapid  development  of  electronic  bulk  memory  technologies  (CCDs  and 
RAMs  [ 16  ] magnetic  bubbles  [17,18]  and  electron  beam  memories  [IS])  may 
make  an  all-electron  fixed-head  disk  replacement  available  very  soon.  This 
would  allow  the  construction  of  a much  faster  PCAM-based  SM  which  would 
not  be  a bottleneck.  An  "electronic-disk"  PCAM  would  still,  however,  have 
many  small  blocks  and  so  would  suffer  from  the  same  keyword  predicate 


limitations  as  a fixed-head  disk  PCAM. 


The  availability  of  cheap  and  very  powerful  microprocessors  opens  the 
way  to  a very  powerful  PCAM  organization.  This  PCAM  consists  of  a small 
number  of  very  large  content-addressable  blocks  and  is  realized  by  a large 
number  of  microprocessor-memory  pairs  as  shown  in  Figure  9.  This  kind  of 
PCAM  would  be  capable  of  supporting  a much  greater  variety  of  keyword 
predicates.  This  is  because  all  keywords  of  a given  attribute  could  prob- 
ably be  stored  in  a single  PCAM  block  and  so,  therefore,  any  predicate 
could  be  applied  to  all  keywords  of  that  attribute  with  a single  access. 


Since  almost  all  keyword  predicates  used  In  a database  system  would.  In 
all  likelihood,  only  deal  with  a specific  keyword  attribute.  It  follows 
that  most  keyword  predicates  likely  to  appear  in  a query  would  be  proc- 
essable  by  the  SM. 

The  SHIP  (see  Figure  10)  is  primarily  a processing  element  and  is 
consequently  not  limited  by  memory  technology.  The  small  amount  of  memory 
required  by  this  component  can  be  realized  with  current  technology.  The 
SMIP  achieves  very  high  speed  by  using  many  processor-memory  pairs  to 
execute  operations  in  parallel.  The  SMIP  is  feasible  with  today's  tech- 
nology and  could  become  quite  inexpensive  in  the  future  as  RAMs  and 
microprocessors  become  cheaper. 

The  mass  memory  (see  Figure  11)  uses  a moving  head  disk  to  realize  a 
PCAM.  Each  cylinder  of  the  disk  represents  one  PCAM  block.  For  high 
performance,  all  of  the  data  on  a cylinder  is  accessed  in  parallel, 
searched  and  stored  in  a buffer  in  a single  disk  revolution.  The  mass 
memory  therefore  uses  a cylinder-size  interleaved  buffer  memory,  multiple 
read/write  assembly  registers  and  a fast  processing  unit.  This  can  be 
done  with  current  technology. 

A detailed  description  of  the  logical  and  physical  structure  of  the 
computer  is  given  in  Part  II  and  Part  III. 

For  maximum  applicability  to  current  computer  systems  we  designed  an 
independent  DBC  with  a very  simple  interface.  The  DBC  directly  implements 
the  attribute-based  data  model  and  a very  powerful  query  language  based  on 
Boolean  expressions  of  keyword  predicates.  The  DBC  supports  a security 
specification  called  a database  capability.  This  construct  allows  access 
privileges  to  be  given  to  sets  of  records  named  by  queries.  The  power  of 
the  database  capability  comes  from  its  ability  to  protect  any  data  item 
that  is  retrievable. 

The  theory  of  the  DBC' a operation  was  based  on  clusters  of  records  of 
like  properties  stored  in  MAUs  (Minimal  Access  Unit).  The  structure  mem- 
ory contained  directory  entries  that  enabled  the  DBC  to  determine  the 
MAUs  where  records  characterized  by  keywords  were  to  be  found  in  the 
database.  The  nature  of  the  directory  entries  and  the  structure  of  the 
queries  determined  the  properties  of  the  access  algorithms.  These  prop- 
erties, in  turn,  influenced  the  structure  of  the  machine.  They  showed  us 
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Figure  11.  The  Architecture  of  the  MM 


the  need  for  e high-speed  set  manipulation  processor,  a structure  memory 
that  could  process  keyword  predicates,  and  a mass  memory  that  could 
properly  support  MAUs.  They  also  showed  us  the  need  for  a way  to  manage 
the  MAUs  of  the  database.  To  supply  this  last  requirement  the  DBC  supports 
a sophisticated  clustering  mechanism  that  allows  records  to  be  automatic- 
ally assigned  to  MAUs.  The  other  requirements  were  met  by  a specialized 
component  (the  SMIP)  to  do  set  operations,  a PCAM-based  structure  memory 
with  very  large-capacity  partitions  that  could  use  hashing  to  handle  a 
broad  class  of  keyword  predicates  and  a mass  memory  in  which  MAUs 
were  partitioned  into  clusters  to  distinguish  records  with  different  sets 
of  properties  from  one  another  within  the  mass  memory.  Security  enforcement 
was  realized  by  two  mechanisms.  The  first  mechanism,  introduced  to  enhance 
performance,  utilized  the  concept  of  security  atoms  to  form  clusters  of 
records  that  were  protected  the  same  way.  The  second  mechanism  ( the  security 
processor)  used  the  actual  file  sanctions  to  enforce  security. 

These  two  types  of  security  mechanisms  allow  security  specifications  to  be 
readily  processed  and  thoroughly  enforced. 

Part  II  and  III  of  the  paper  gives  detailed  specifications  of  the  data 
and  instruction  formats  of  the  database  computer  and  its  componenents , the 
structure,  speeds  and  capacities  of  the  components  and  the  technology  re- 
quired to  build  the  machine.  It  will  be  shown  there  that  the  architectural 
principles  used  in  the  database  computer  do  not  require  distant  technology 
and  so  can  be  realized  in  the  near  future.  Preliminary  study  on  how  the 
DBC  should  support  higher-level  data  models  (such  as  the  network  model) 
is  underway.  Early  work  shows  that  the  propsoed  DBC  can  indeed  support 
high-level  data  models. 
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